Colander: Sifting Documents for Special Terms

نویسندگان

  • Shridhara Aithal
  • Sudarshan Murthy
چکیده

We propose to demonstrate Colander, a set of lightweight approaches, and a tool, to mine “special terms” from a corpus of documents. In this proposal, we give an overview of three highlevel approaches to extracting special terms; introduce some metrics to measure the performance of the approaches; and outline an evaluation methodology. We also provide a high-level description of the tool and its features. In this proposal, we illustrate the approaches and the tool in an application to mine both known and candidate trademarks used in documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Photon-Number-Splitting-attack resistant Quantum Key Distribution Protocols without sifting

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

Higher-order frameworks for profiling and matching heterogeneous data

This Thesis brings together complementary research from higher-order computational logic and workflow systems to investigate software and theoretical frameworks for profiling and matching heterogeneous data. A motivating use case is submission sifting, which matches submitted conference or journal papers to potential peer reviewers based on the similarity between the paper’s abstract and the re...

متن کامل

A Model Sifting Problem of Selberg

We study a model sifting problem introduced by Selberg, in which all of the primes have roughly the same size. We show that the Selberg lower bound sieve is asymptotically optimal in this setting, and we use this to give a new lower bound on the sifting limit βκ in terms of the sifting dimension κ. We also show that one can use a rounding procedure to improve on the Selberg lower bound sieve by...

متن کامل

Comparison of Texts Streams in the Presence of Mild Adversaries

Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document’s fingerprint. We build upon previous work by Broder et al. [4,5] and Heintze [8], specifically addressing a certain set of attacks t...

متن کامل

Post Walrasian Macroeconomics and IS / LM Analysis

In recent work I have tried to spell out a Post Walrasian approach to macroeconomics (Colander, 1995a) and to translate that Post Walrasian vision into the aggregate supply/aggregate demand framework (Colander, 1995b). In this paper I continue that work and begin to relate the Post Walrasian vision to the standard IS/LM analysis. The paper is not about high theory; instead it is about the pedag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009